{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Installing Python Packages on Server without Internet (Experimental)\n",
    "\n",
    "**WARNING: The feature introduced in this tutorial is currently experimental. It does not have any API stability guarantee.**\n",
    "\n",
    "In many environments, access from the database server to the Internet is disabled for security reasons. This makes it hard to install the Python packages required for data analytics on server.\n",
    "\n",
    "To overcome this limitation, GreenplumPython provides a function `Database.install_packages()` to help the user\n",
    "\n",
    "1. Download Python packages from a PyPI site to the client;\n",
    "1. Pack and upload the downloaded packages to the database server;\n",
    "1. Install the uploaded Python packages on server.\n",
    "\n",
    "All these happen automatically and the user only need to declare what packages are needed.\n",
    "\n",
    "In this way, as long as there is a database connection on a client with Internet access, the user can easily install the required packages, even if the database server cannot access the Internet by itself.\n",
    "\n",
    "**NOTE: This function only installs packages on the server host that GreenplumPython directly connects to. If your database server spreads across multiple hosts, additional operations are required to make the packages available on all hosts.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (Optional) Prerequisite: Sharing Python Environments in a Cluster with NFS\n",
    "\n",
    "Setting up a NFS mount makes it easier to share a Python environment on multiple hosts and containers.\n",
    "\n",
    "This is important for distributed database systems such as [Greenplum](https://greenplum.org/) because otherwise the same set of packages needs to be copied to every host in the cluster.\n",
    "\n",
    "### Starting an NFS server\n",
    "\n",
    "First, we need to install and start an NFS server on one host. As an example, for Greenplum, we can start it on the coordinator host.\n",
    "\n",
    "For how to do this, please refer to the documentation of the OS. For example, if you are using [Rocky Linux](https://rockylinux.org/), you might want to refer to [the NFS page](https://docs.rockylinux.org/guides/file_sharing/nfsserver/).\n",
    "\n",
    "### Mount a Python environment with NFS on Each Host\n",
    "\n",
    "Next, we can mount a Python environment with NFS and share it to all hosts in the cluster. \n",
    "\n",
    "In this way, we only need to install the packages on one host and the packages will be made available to all other hosts as well through NFS.\n",
    "\n",
    "**WARNING: This will affect all applications on the hosts. Please make sure that the database server is the only application that uses Python.**\n",
    "\n",
    "**WARNING: This will hide all the files originally at the mount point. Please re-install them if they are needed by the database server.** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 8\n",
      "drwxrwxr-x. 2 gpadmin gpadmin 4096 Oct  8 03:17 bin\n",
      "drwxrwxr-x. 3 gpadmin gpadmin   21 Oct  8 03:17 etc\n",
      "drwxrwxr-x. 2 gpadmin gpadmin    6 Oct  7 23:32 include\n",
      "drwxrwxr-x. 3 gpadmin gpadmin   23 Oct  7 23:32 lib\n",
      "lrwxrwxrwx. 1 gpadmin gpadmin    3 Oct  7 23:32 lib64 -> lib\n",
      "-rw-rw-r--. 1 gpadmin gpadmin   80 Oct  8 03:47 pyvenv.cfg\n",
      "drwxrwxr-x. 6 gpadmin gpadmin   65 Oct  8 03:17 share\n"
     ]
    }
   ],
   "source": [
    "! python3 -m venv /tmp/test_venv\n",
    "! sudo mount -t nfs \"$(hostname):/tmp/test_venv\" \"$(python3 -m site --user-base)\"\n",
    "! ls -l \"$(python3 -m site --user-base)\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now Python environment is mounted at the [Python user base directory](https://docs.python.org/3/library/site.html#site.USER_BASE) as an NFS. \n",
    "\n",
    "This means all packages installed with `pip` later will be available to all hosts with the NFS mounted.\n",
    "\n",
    "Please note that if there is more than one hosts in the cluster, the commands above needs to be executed on **each** of them.\n",
    "\n",
    "For example, if you are using Greenplum, this can be done by executing the commands in a `gpssh` session.\n",
    "\n",
    "Note that the NFS can be unmounted by"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! sudo umount \"$(python3 -m site --user-base)\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: A UDF requiring a Third-Party Package\n",
    "\n",
    "It is very common for a UDF that depends on a package that is not in the [Python Standard Library](https://docs.python.org/3/library/index.html). We can write one as a very simple example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/gpadmin/GreenplumPython\n",
      "Processing /home/gpadmin/GreenplumPython\n",
      "  Installing build dependencies ... \u001b[?25ldone\n",
      "\u001b[?25h  Getting requirements to build wheel ... \u001b[?25ldone\n",
      "\u001b[?25h    Preparing wheel metadata ... \u001b[?25ldone\n",
      "\u001b[?25hCollecting dill==0.3.6\n",
      "  Using cached dill-0.3.6-py3-none-any.whl (110 kB)\n",
      "Collecting psycopg2-binary==2.9.5\n",
      "  Using cached psycopg2_binary-2.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)\n",
      "Building wheels for collected packages: greenplum-python\n",
      "  Building wheel for greenplum-python (PEP 517) ... \u001b[?25ldone\n",
      "\u001b[?25h  Created wheel for greenplum-python: filename=greenplum_python-1.0.1-py3-none-any.whl size=75737 sha256=88ea8683b2bba05bc714308aba1c612e06e40d849fd654296f549c697d3bd742\n",
      "  Stored in directory: /tmp/pip-ephem-wheel-cache-n_iflnui/wheels/bb/1f/99/ff8594e48ec11df99af6e0ee8611a5e560e9f44d1a3fefb351\n",
      "Successfully built greenplum-python\n",
      "Installing collected packages: dill, psycopg2-binary, greenplum-python\n",
      "Successfully installed dill-0.3.6 greenplum-python-1.0.1 psycopg2-binary-2.9.5\n",
      "\u001b[33mWARNING: You are using pip version 20.2.4; however, version 23.2.1 is available.\n",
      "You should consider upgrading via the '/tmp/test_venv/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "%cd ../../../\n",
    "!python3 -m pip install --upgrade ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import greenplumpython as gp\n",
    "\n",
    "db = gp.database(\"postgresql://localhost:7000\")\n",
    "\n",
    "\n",
    "@gp.create_function\n",
    "def fake_name() -> str:\n",
    "    from faker import Faker  # type: ignore reportMissingImports\n",
    "\n",
    "    fake = Faker()\n",
    "    return fake.name()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The UDF `fake_name()` generates fake names at random. This can be helpful for anonymizing the data.\n",
    "\n",
    "However, if we try to call this UDF, we will get an error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "ename": "ExternalRoutineException",
     "evalue": "ModuleNotFoundError: No module named 'faker'\nCONTEXT:  Traceback (most recent call last):\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 16, in <module>\n    return GD['__func_16dc1c114c0344938bc1d85945e9f0ad']()\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 6, in fake_name\nPL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\"\n",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mExternalRoutineException\u001b[0m                  Traceback (most recent call last)",
      "File \u001b[0;32m/tmp/test_venv/lib64/python3.9/site-packages/IPython/core/formatters.py:708\u001b[0m, in \u001b[0;36mPlainTextFormatter.__call__\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m    701\u001b[0m stream \u001b[38;5;241m=\u001b[39m StringIO()\n\u001b[1;32m    702\u001b[0m printer \u001b[38;5;241m=\u001b[39m pretty\u001b[38;5;241m.\u001b[39mRepresentationPrinter(stream, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mverbose,\n\u001b[1;32m    703\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmax_width, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mnewline,\n\u001b[1;32m    704\u001b[0m     max_seq_length\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmax_seq_length,\n\u001b[1;32m    705\u001b[0m     singleton_pprinters\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msingleton_printers,\n\u001b[1;32m    706\u001b[0m     type_pprinters\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtype_printers,\n\u001b[1;32m    707\u001b[0m     deferred_pprinters\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdeferred_printers)\n\u001b[0;32m--> 708\u001b[0m \u001b[43mprinter\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpretty\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    709\u001b[0m printer\u001b[38;5;241m.\u001b[39mflush()\n\u001b[1;32m    710\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m stream\u001b[38;5;241m.\u001b[39mgetvalue()\n",
      "File \u001b[0;32m/tmp/test_venv/lib64/python3.9/site-packages/IPython/lib/pretty.py:410\u001b[0m, in \u001b[0;36mRepresentationPrinter.pretty\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m    407\u001b[0m                         \u001b[38;5;28;01mreturn\u001b[39;00m meth(obj, \u001b[38;5;28mself\u001b[39m, cycle)\n\u001b[1;32m    408\u001b[0m                 \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mcls\u001b[39m \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mobject\u001b[39m \\\n\u001b[1;32m    409\u001b[0m                         \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28mcallable\u001b[39m(\u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__dict__\u001b[39m\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m__repr__\u001b[39m\u001b[38;5;124m'\u001b[39m)):\n\u001b[0;32m--> 410\u001b[0m                     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_repr_pprint\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcycle\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    412\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m _default_pprint(obj, \u001b[38;5;28mself\u001b[39m, cycle)\n\u001b[1;32m    413\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n",
      "File \u001b[0;32m/tmp/test_venv/lib64/python3.9/site-packages/IPython/lib/pretty.py:778\u001b[0m, in \u001b[0;36m_repr_pprint\u001b[0;34m(obj, p, cycle)\u001b[0m\n\u001b[1;32m    776\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"A pprint that just redirects to the normal repr function.\"\"\"\u001b[39;00m\n\u001b[1;32m    777\u001b[0m \u001b[38;5;66;03m# Find newlines and replace them with p.break_()\u001b[39;00m\n\u001b[0;32m--> 778\u001b[0m output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mrepr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    779\u001b[0m lines \u001b[38;5;241m=\u001b[39m output\u001b[38;5;241m.\u001b[39msplitlines()\n\u001b[1;32m    780\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m p\u001b[38;5;241m.\u001b[39mgroup():\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/dataframe.py:220\u001b[0m, in \u001b[0;36mDataFrame.__repr__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    213\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__repr__\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mstr\u001b[39m:\n\u001b[1;32m    214\u001b[0m     \u001b[38;5;66;03m# noqa\u001b[39;00m\n\u001b[1;32m    215\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m    216\u001b[0m \u001b[38;5;124;03m    :meta private:\u001b[39;00m\n\u001b[1;32m    217\u001b[0m \n\u001b[1;32m    218\u001b[0m \u001b[38;5;124;03m    Return a string representation for a dataframe\u001b[39;00m\n\u001b[1;32m    219\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m--> 220\u001b[0m     contents \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mlist\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m    221\u001b[0m     row_num_string \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(contents)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m row\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124ms\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mif\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28mlen\u001b[39m(contents)\u001b[38;5;250m \u001b[39m\u001b[38;5;241m!=\u001b[39m\u001b[38;5;250m \u001b[39m\u001b[38;5;241m1\u001b[39m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01melse\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m)\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    222\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(contents) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m0\u001b[39m:  \u001b[38;5;66;03m# DataFrame is empty\u001b[39;00m\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/dataframe.py:749\u001b[0m, in \u001b[0;36mDataFrame.__iter__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    747\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m DataFrame\u001b[38;5;241m.\u001b[39mIterator(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents)\n\u001b[1;32m    748\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_db \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 749\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_fetch\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    750\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    751\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m DataFrame\u001b[38;5;241m.\u001b[39mIterator(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents)\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/dataframe.py:873\u001b[0m, in \u001b[0;36mDataFrame._fetch\u001b[0;34m(self, is_all)\u001b[0m\n\u001b[1;32m    868\u001b[0m output_name \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcte_\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m uuid4()\u001b[38;5;241m.\u001b[39mhex\n\u001b[1;32m    869\u001b[0m to_json_dataframe \u001b[38;5;241m=\u001b[39m DataFrame(\n\u001b[1;32m    870\u001b[0m     \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mSELECT to_json(\u001b[39m\u001b[38;5;132;01m{\u001b[39;00moutput_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m)::TEXT FROM \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m AS \u001b[39m\u001b[38;5;132;01m{\u001b[39;00moutput_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m    871\u001b[0m     parents\u001b[38;5;241m=\u001b[39m[\u001b[38;5;28mself\u001b[39m],\n\u001b[1;32m    872\u001b[0m )\n\u001b[0;32m--> 873\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_db\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_execute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mto_json_dataframe\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_serialize\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    874\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(result, Iterable) \u001b[38;5;28;01melse\u001b[39;00m []\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/db.py:77\u001b[0m, in \u001b[0;36mDatabase._execute\u001b[0;34m(self, query, has_results)\u001b[0m\n\u001b[1;32m     75\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m config\u001b[38;5;241m.\u001b[39mprint_sql:\n\u001b[1;32m     76\u001b[0m     \u001b[38;5;28mprint\u001b[39m(query)\n\u001b[0;32m---> 77\u001b[0m \u001b[43mcursor\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mquery\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     78\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m cursor\u001b[38;5;241m.\u001b[39mfetchall() \u001b[38;5;28;01mif\u001b[39;00m has_results \u001b[38;5;28;01melse\u001b[39;00m cursor\u001b[38;5;241m.\u001b[39mrowcount\n",
      "File \u001b[0;32m/tmp/test_venv/lib64/python3.9/site-packages/psycopg2/extras.py:236\u001b[0m, in \u001b[0;36mRealDictCursor.execute\u001b[0;34m(self, query, vars)\u001b[0m\n\u001b[1;32m    234\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumn_mapping \u001b[38;5;241m=\u001b[39m []\n\u001b[1;32m    235\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_query_executed \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[0;32m--> 236\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mquery\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mvars\u001b[39;49m\u001b[43m)\u001b[49m\n",
      "\u001b[0;31mExternalRoutineException\u001b[0m: ModuleNotFoundError: No module named 'faker'\nCONTEXT:  Traceback (most recent call last):\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 16, in <module>\n    return GD['__func_16dc1c114c0344938bc1d85945e9f0ad']()\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 6, in fake_name\nPL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\"\n"
     ]
    },
    {
     "ename": "ExternalRoutineException",
     "evalue": "ModuleNotFoundError: No module named 'faker'\nCONTEXT:  Traceback (most recent call last):\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 3, in <module>\n    return GD['__func_16dc1c114c0344938bc1d85945e9f0ad']()\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 6, in fake_name\nPL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\"\n",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mExternalRoutineException\u001b[0m                  Traceback (most recent call last)",
      "File \u001b[0;32m/tmp/test_venv/lib64/python3.9/site-packages/IPython/core/formatters.py:344\u001b[0m, in \u001b[0;36mBaseFormatter.__call__\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m    342\u001b[0m     method \u001b[38;5;241m=\u001b[39m get_real_method(obj, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprint_method)\n\u001b[1;32m    343\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m method \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 344\u001b[0m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mmethod\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    345\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    346\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/dataframe.py:281\u001b[0m, in \u001b[0;36mDataFrame._repr_html_\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    279\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\":meta private:\"\"\"\u001b[39;00m\n\u001b[1;32m    280\u001b[0m repr_html_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m--> 281\u001b[0m ret \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mlist\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m    282\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(ret) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[1;32m    283\u001b[0m     repr_html_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m<table>\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/dataframe.py:749\u001b[0m, in \u001b[0;36mDataFrame.__iter__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    747\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m DataFrame\u001b[38;5;241m.\u001b[39mIterator(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents)\n\u001b[1;32m    748\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_db \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m--> 749\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_fetch\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    750\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    751\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m DataFrame\u001b[38;5;241m.\u001b[39mIterator(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_contents)\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/dataframe.py:873\u001b[0m, in \u001b[0;36mDataFrame._fetch\u001b[0;34m(self, is_all)\u001b[0m\n\u001b[1;32m    868\u001b[0m output_name \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcte_\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m uuid4()\u001b[38;5;241m.\u001b[39mhex\n\u001b[1;32m    869\u001b[0m to_json_dataframe \u001b[38;5;241m=\u001b[39m DataFrame(\n\u001b[1;32m    870\u001b[0m     \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mSELECT to_json(\u001b[39m\u001b[38;5;132;01m{\u001b[39;00moutput_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m)::TEXT FROM \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m AS \u001b[39m\u001b[38;5;132;01m{\u001b[39;00moutput_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m    871\u001b[0m     parents\u001b[38;5;241m=\u001b[39m[\u001b[38;5;28mself\u001b[39m],\n\u001b[1;32m    872\u001b[0m )\n\u001b[0;32m--> 873\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_db\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_execute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mto_json_dataframe\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_serialize\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    874\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(result, Iterable) \u001b[38;5;28;01melse\u001b[39;00m []\n",
      "File \u001b[0;32m~/GreenplumPython/greenplumpython/db.py:77\u001b[0m, in \u001b[0;36mDatabase._execute\u001b[0;34m(self, query, has_results)\u001b[0m\n\u001b[1;32m     75\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m config\u001b[38;5;241m.\u001b[39mprint_sql:\n\u001b[1;32m     76\u001b[0m     \u001b[38;5;28mprint\u001b[39m(query)\n\u001b[0;32m---> 77\u001b[0m \u001b[43mcursor\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mquery\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m     78\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m cursor\u001b[38;5;241m.\u001b[39mfetchall() \u001b[38;5;28;01mif\u001b[39;00m has_results \u001b[38;5;28;01melse\u001b[39;00m cursor\u001b[38;5;241m.\u001b[39mrowcount\n",
      "File \u001b[0;32m/tmp/test_venv/lib64/python3.9/site-packages/psycopg2/extras.py:236\u001b[0m, in \u001b[0;36mRealDictCursor.execute\u001b[0;34m(self, query, vars)\u001b[0m\n\u001b[1;32m    234\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumn_mapping \u001b[38;5;241m=\u001b[39m []\n\u001b[1;32m    235\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_query_executed \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[0;32m--> 236\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mquery\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mvars\u001b[39;49m\u001b[43m)\u001b[49m\n",
      "\u001b[0;31mExternalRoutineException\u001b[0m: ModuleNotFoundError: No module named 'faker'\nCONTEXT:  Traceback (most recent call last):\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 3, in <module>\n    return GD['__func_16dc1c114c0344938bc1d85945e9f0ad']()\n  PL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\", line 6, in fake_name\nPL/Python function \"func_16dc1c114c0344938bc1d85945e9f0ad\"\n"
     ]
    }
   ],
   "source": [
    "db.apply(lambda: fake_name())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the error message\n",
    "\n",
    "> ModuleNotFoundError: No module named 'faker'\n",
    "\n",
    "we learn that the error is due to missing of the module `faker`. We can fix it by installing it on server."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installing Python Packages\n",
    "\n",
    "To install the package on server, we can simply call `Database.install_packages()`. \n",
    "\n",
    "The packages will be installed to the currently activated environment. If there is no virtual environment activated, the packages will be installed to the [user's site-packages directory](https://docs.python.org/3/library/site.html#site.USER_SITE) if the normal (system) site-packages directory is not writeable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import greenplumpython.experimental.file\n",
    "\n",
    "db.install_packages(\"faker==19.6.1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The installation succeeded if no error showed up. We can verify it by running `fake_name()` again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "\t<tr>\n",
       "\t\t<th>name</th>\n",
       "\t</tr>\n",
       "\t<tr>\n",
       "\t\t<td>Melinda Tran</td>\n",
       "\t</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "--------------\n",
       " name         \n",
       "--------------\n",
       " Melinda Tran \n",
       "--------------\n",
       "(1 row)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.apply(lambda: fake_name(), column_name=\"name\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
